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ABSTRACT 


This thesis is concerned with the implementation of an adaptive identification 
algorithm using parallel processing and systolic arrays. In particular, discrete samples 
of input and output data of a svstem with uncertain characteristics are used to 
determine the parameters of its model. The identification algorithm is based on 
recursive least squares, QR decomposition, and block processing techniques with 
covariance resetting. Along similar lines as previous approaches, the identification 
process 1s based on the use of Givens rotations. This approach uses the Cordic 
technique for improved numerical efficiency in performing the rotations. Additionally, 
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I. INTRODUCTION 


A. BACKGROUND 

The problem of adaptively controlling systems with uncertain characteristics 
depends largely on identification of the unknown system parameters. The ability to 
accurately and quickly estimate these parameters, then, 1s of primary importance. The 
state of computer development today has made this estimation possible in real time. 

The goal of parameter estimation is to best fit an appropriate model to the input- 
output data of the plant under investigation. This immediately leaves us with two 
basic and distinct problems: the choice of a parameter model and the choice of an 
estimation algorithm. 

We desire to select a parameter model which relates input and output data by 


means of weighted parameters, in the form 
y(t) = p(t) 8 + v(t) (1.1) 


with y(t) and o!(t) representing the output and signals respectively available for 
measurements, and where 9 is an array of unknown parameters to be determined. The 
term v(t) represents noise or other modeling errors. Though numerous choices of 
model structures exist, linear models still remain the most desirable due to their 
simplicity and the considerable amount of theory developed to analyze them. 

Secondly, we need to choose an appropriate estimation algorithm. Again, several 
possibilities exist, but the most effective in terms of converging is the recursive least 
squares algorithm [Ref. 1: pp. 49-68]. This algorithm is implementable using on line 
matrix manipulations, and the technique is based upon on line solution of a system of 
linear equations. 

The major drawback of recursive least squares is the computational complexity 
involved. The size of the matrices grows with the complexity of the system to be 
modeled. Although available microprocessors are effective for low order systems and 
slow sampling rates, more complex problem require improved capabilities. These 
capabilities are provided by systolic arrays built using VLSI technology. 


This research analyzes the on line recursive least squares identification algorithm. 
This algorithm processes blocks of data, using values gained from the previous block as 
initializing values, in a method known as block processing. This block processing 


technique is based on QR deconiposition, discussed in the next chapter. 


B. IMPLEMENTATION CONSIDERATIONS 

The particular computing structure we are considering is based on the systolic 
array (or wavefront array). The systolic array consists of an array of individual 
processing cells, each provided with a local memory and processing unit of its own, 
connected in such a way that each cell communicates only with its nearest neighbors. 
The array is designed so that data is continuously clocked or pumped throughout in a 
rhythmic fashion; hence the name “systolic.” The cells are simple in that they are 
required to perform onlv basic mathematical functions on the data received from 
neighboring cells. Special purpose hardware incorporating systolic arrays can be built 
using VLSI technology. The advantage with this technique is the fact that a complex 
Operation is performed by several processors at a time, thus increasing the throughput 
of data. 

Previous authors [Refs. 2,3: pp.29 - 37, pp.255 - 274] have presented parameter 
estimation algorithms using systolic arrays. The general idea has been to solve a 
system of linear equations in two stages: first, by triangularization of the matrix of 
coefficients, and second, solving by successive substitution. The previous algorithms 
have all been similar in that they used a triangular systolic array to triangularize the 
matrix, and a linear systolic array configuration to solve for the parameters. I[t turns 
out that the arrays for the triangular and linear sections are different, and furthermore 
the linear section requires operations such as divisions which are hardly implementable 
by simple processor operations. 

In this thesis we present an algorithm which is based on two identical triangular 
sections. It is characterized by the fact that only orthogonal operations are involved, 
thus making the algorithm numerically more stable, and easily implementable by simple 
shift and add operations. Also, one of the advantages is that only two different types 
of cells are necessary (as compared to four in the previous implementations), resulting 
in a simplification in the manufacturing process. These cells perform different 
functions from those used by the above referenced authors. The Cordic technique is 
used by the cells to perform vector rotations, resulting in improved numerical 
efficiency. Though more total cells are required in this implementation, the cost of 


additional cells in a VLSI scheme is considered to be minimal. 
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The use of fixed point versus floating point arithmetic is considered during this 
investigation. Because fixed point operations are based on simple shift functions and 
finite registers, which are simple to implement, it seemed advantageous to use fixed 
point values. However, since input and output data do not naturally appear as integer 
values, there was concern over loss of accuracy due to necessary scaling and 
truncation. With proper choice of scaling factors, it will be shown that the integer 
methods perform as well as floating point. 

This thesis is divided as follows: Chapter II discusses the methods of system 
identification, 1.e. solution of svstems of linear equations, QR decomposition, and 
recursive least squares; Chapter III discusses the Cordic technique and _ the 
implementation of the svstolic arrays. Chapter IV discusses the simulation results, and 
Chapter V draws the final conclusions. A listing of the program used to simulate the 


systolic arrays 1s found in the appendix. 
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Il. SYSTEM IDENTIFICATION METHODS 


A. LINEAR SYSTEM MODELING a 

The first step in any parameter identification process 1s to model the system in 
mathematical terms. To this end, consider a general system with a single input u(t) 
and single output y(t), as shown in Figure 2.1. 





Figure 2.1 Simple Linear System. 


This system can be described by the linear, constant coefficient difference equation 


y(t) + ay(t-l) + .. + a y(t-n) = bu(t-1) +... + b u(t-m) + v(t) | (2.1) 
where the a;’s and the b;’s are real constants, and the equation (and system) 1s of nth 
order. The v(t) term denotes a noise variable. Equation (2.1) represents a class of 
discrete systems, known as recursive systems because the output depends not only on 
the input but on the previous output values also. 

An alternative way to represent the above equation is by defining the parameter 
vector as 


tT — 
8° = (2), ag, + »2,, by by, .. .b (2.2) 
and the regression vector of input-output data as 
p! (t) = [-y(t-l),.. ,-y(ten), u(t-1), ... , u(t-m)] (2.3) 
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Now, using (2.2) and (2.3), we can rewrite (2.1) as 
y(t) = O'(t) @ + v(t) (2.4) 


The vector 8 contains the parameters which we want to determine. ‘This can be 
obtained by sampling the system at some sample frequency (say, @.) and accumulating 
a numerical sequence that describes the system at discrete time intervals. We can 
express this in the form of a system of linear equations by considering the sequence at 
several instants of time. Ideally, if the number of samples of u(t) and y(t) (i.e. the 
number of equations) equals the number of unknowns (m + n), we should be able to 
solve exactly for 8. However, it is normally the case that the number of equations is 
greater than the number of unknowns. This system may or mav not have a solution. 
Ideallv, in the noiseless case (v(t) = 0) a solution exists, regardless of the number of 
equations. However, since noise and numerical errors are present, we look for the least 
squares solution of the system of equations, 1.e. for the solution which minimizes the 


error 
(8) = || A@- b ||? (5) 


This always exists, although it might not be unique. 


B. SOLUTION OF SYSTEMS OF LINEAR EQUATIONS 
In the previous section we saw how the linear system could be modeled by (2.4). 


When numerous samples are taken, we end up with a system of linear equations, such 


as 
y(t) = g(t) @ + v(t) 
v(t-1) = @l(t-1) 0 + v(t-l) 
y(t-2) = @l(t-2)0 + v(t-2) 


This system of equations can be expressed in the form 


A®@=b (2.6) 
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where Ae ZA™ Ge RR" be R™. Now, when m = n, and A is full rank and 


invertible, we can solve uniquely for @ as 
9 = Alb (2.7) 


However, in general we find that m > n (i.e. more equations than unknowns). In this 


instance, the solution of (2.7) is defined in the least squares sense as 


Q@. where ||A@, - bi] = min, ||A@ - b]| (2.8) 
Although (2.8) can be solved by pseudoinverse, this technique is not attractive due to 
the presence of matrix inversion. An alternative way is to triangularize A in (2.6) and 
then solve for @ by successive back substitutions. This, of course, is the basis of 
Gaussian elimination techniques. 

But now we face the dilemma of triangularizing an array when m > n. The 
solution is to triangularize the upper part of the A matrix, leaving one or more rows of 
zeros at the bottom as a result. This will permit us to solve for @ in the least squares 
sense, as indicated in (2.8). This is known as QR decomposition. (For review of least 
squares estimation in statistical theory, the reader is referred to the literature.) 


C. QR DECOMPOSITION 

A means to solve for the least-squares solution of a system of linear equations is 
provided by the QR decomposition of a matrix [Ref. 4: pp. 209-215]. Consider again 
Equation (2.6) with m > n, where 8 = 0. (i.e. least squares sense). We can factor the 


m X n matrix A as 


A=QB , (2.9) 
where Q is am X n orthogonal matrix such that Qqt = |, and & is defined as 
fe (2.10) 


R = |--- 
= 0 
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where R is ann *X n upper triangular matnx, and 0 represents a zero matrix. Now we 


can rewrite (2.6) as 


B 
“a 9=Q'b= nie (2a) 
0 i By ‘ 


And finally, from (2.11), it follows that 


RO=68, (2.12) 


Now, the solution of (2.6) in the least squares sense is the same as the solution of 
n equations and n unknowns in (2.12). 


Proof. Consider the set of equations 


e = A®-b = QRO- 


is 


where e represents the least squares error vector. Then, 


eTe = (pb! - OTAT) (AO - b) 
Q'e = RO-Q'b = RO-B 


Then, we have 


eke = eTQQTe = || RO- 8B II? 











ooltele tel 
“WA 








= || R@- 8B, Il? + | By II? 
We see that B, is independent of 8. Therefore, le}|? is minimized when 
= a = 
6, = R°B, or RO, = B, ° 
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Based upon these results, we see that we can now solve (2.12) for @ by back 
substitution. | 

There are a number of methods available that can be used to compute the upper 
triangular matrix R. In particular, the Givens rotation [Ref. 5: pp. 497 - 499] is 
attractive for systolic array implementation since it 1s based on manipulation of pairs 
of adjacent rows, thus requinng minimum broadcasting of data. Cordic techniques are 


used to implement these rotations, and are discussed in the next chapter. 


D. RECURSIVE LEAST SQUARES 
The problem now 1s to determine a recursive algorithm that can be used to 


estimate the parameter @ in the model 
y(t) = pi(t) @ (2.13) 


We desire to estimate @ from a set of measured data y(t) and g(t). In the least squares 
sense we choose this estimate by best fitting our model to the available data. That is, 


we wish to minimize €, where 
T A 
le] = ly(t)- @ (t) 9 (2.14) 
A e ° e A * e 
8 refers to the estimate of @ at any time t. In particular 9, satisfies the equations 


er(t-1)| ——[y(t=1) 


A=) + A(t-1)6, = y(t-1) 
47 (0) y(0) 


(2.15) 


A 
It is possible to.show that 9, can be recursively computed as (Ref. 6: pp. 17 - 22] 


M(t) (v(t) - 87 @()) 


O41 = 8, + P(t-1) + ot(y Pl) oy (2.16) 
where P(t) = (A(t) FA(t))! satisfies the recursion 
T « 
P(t) = P(t-1) - P(t-1) Oy WED (2.17) 


1 + g(t) P(t-1) g(t) 
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To this point, we have ignored the problem of the existence of the solution. If 
A(t) is singular, then P(t) does not exist. Furthermore, we can not initialize A(-1) = 
because this would leave P(-1) undefined. The solution is to incorporate initial 
conditions into A(t) so that P(t) can be computed at each t. Choose A(-!) = oy], 
where 6, > 0 is some arbitrary constant, and I is the identity matrix of appropriate 
dimensions. Then, by algebraic manipulation, Equation (2.15) becomes 


¢*(t-1) y(¢-1) 

| (2.18) 
or(0) | * | y(0) 
}A(-1) | -  JA(-1)8 


The solution to (2.18) can be made fully recursive using (2.16) and (2.17) and 
appropniate initial conditions. 


E. BLOCK PROCESSING AND COVARIANCE RESETTING 

In the previous section, an algonthm was described that allows us to estimate the 
parameter 9 recursively. However, note that from the definition of P(t) (also known as 
the covariance matrix) 


P(t) = (AT(t) A(D)! = (Bg + D ot) O'() (2.19) 


that P(t) — 0 ast — © [Ref. 6: p. 21]. Therefore, the algorithm loses sensitivity as t 
increases, and later values of 8 may not be as accurate as earlier values, especially if 
our model changes with time. There are two possible remedies to this problem: the use 
of a “forgetting factor”, or the block processing approach. 

The forgetting factor minimizes the error 











(t)|[ = y AF (y(k) — 67(k) 8144)? + 06 | Bra @0| | (2.20) 


k=0 


where 0<X<1 is the forgetting factor. This has the effect of assigning a higher weight 
to more recent data. 


Ie 


The block processing approach, on the other hand, divides the time set into 
segments of equal and fixed length N as in Figure 2.2. At the end of each time 





Figure 2.2 Time Line. 


block, we reset the covariance matrix P(t). Although it can be reset at any time, for 


convenience we choose the end of each block, and P(t) now becomes 


(Oe ; . (2.21) 
Equation (2.17) otherwise 


Therefore, the beginning of each time block 1s treated as a new, initialized period. 

It has been shown [Ref. 2: p. 20] that an external input u(t), sufficiently rich in 
frequency (n sinusoids), together with blocks [, of sufficient length N provide a 
guarantee for an estimation 8 > 6° where 0° represents actual parameters. The effect 
of various lengths of N will be investigated later. 

If we apply the considerations above to the general case, we can see that the 


parameter estimates at the end of each time block are given by 


¢1((k+1)N-1) y((k+1)N-1) 
sty [fom = |i an 
Col O09 uN 


This will be the foundation with which we build our recursive parallel algonthm. 
Although the algorithm presented in this chapter assumes a single input, single 
ouput (SISO) plant, it can easily be extended to the multiple input-output (MIMO) 
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plant [Ref. 7: pp. 25-27]. Additionally, we assume our plant to be causal. 
Furthermore, because the problem of order determination is necessarily complicated, it 


will be assumed that the order of the plant is always known to the designer. 


Ill. SYSTOLIC ARRAY IMPLEMENTATION 


A. IMPLEMENTATION OF ALGORITHM 

We now turn our attention to the solution of (2.22) using systolic arrays. In 
particular, (2.22) is suited for parallel computation. As described in [Ref. 2: pp. 23 - 
25], the identification problem can be reconstructed into a set of linear equations as 


R, 8 


Katyn = Bey Sap) 


with R, in upper triangular form. We see here that 8 +N Can be computed by using 
two processors in cascade: one to compute R, and B,,, and the second to compute 8 
from (3.1). Note that implicit matrix inversion is not necessary since R, is in upper 
triangular form. 

As discussed in the previous chapter, we initialize the array at the beginning of 
each time block to oI (left half) and o,8,,; (right half). This has the effect of 
initializing the R, matrix in (3.1) to an upper triangular form. Then, at each discrete 
time n, an array of data @(n) and y(n) will be passed to the systolic array. The task of 
the array is to “re-triangularize” the data at each clock pulse, so the matrix remains in 
the form prescribed by (3.1). It is then a simple matter to solve for DK + 1)N’ This 
technique is based on QR decomposition as the means to triangularize the data array. 

The value of Op relates to the confidence we have in the initial estimates. The 
larger the value of 6p, the lower our confidence. Equations (2.18) - (2.22) illustrate the 
role that Gp plays. 


B. GIVENS ROTATION 

An attractive and easily implemented method of triangularization by QR 
decomposition is provided by the Givens rotations. The reason for this choice is the 
fact that the Givens rotation operates only on adjacent data, making it suitable for 
systolic array implementation. The object is to combine two adjacent rows in the 
matrix, forcing zeros in the appropriate positions so to obtain an upper triangular data 
matrix. Figure 3.1 shows an example of triangularization by successive Givens 


rotations. 
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Figure 3.1 Example of Row Operations. 


The underlying idea for this scheme is that any two rows of elements a., .. a. n 


ati Ae pg MAY be considered as a sequence of vectors (x,y) in R* as in Pigune 3-2. 





Figure 3.2. Vectors in R?. 


If we rotate a vector (a; |. ay tj) by an angle @ so to make it parallel to the x axis, then 
the y value (a. +14) becomes Zero. This same rotation @ 1s then applied to the 
remaining (x,y) vectors in the affected rows (in this case, rows 1, i+1) to ensure 
algebraic equality. Next, the sequence of rotations is repeated for the remaining pairs 
of rows (i.e. rows i+ 1, i+2; i+2, i+3; ... ) in order to force zeros in the correct 


locations that leave an upper triangular matnx. 
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Recall that the data matrix is initialized to an upper triangular form (691). Then, 
as we take samples from the signals in the plant, new values of p(t) and y(t) are added 
to the matrix. By a sequence of Givens rotations, we can transform it into an upper 


triangular matrix. An example set of rotations is shown in Figure 3.3, where the 


O) saya 


422 
0 


0 a 


at} al 2 ra ti 8 





Figure 3.3 Example of Givens Rotations. 


operations performed by the systolic array at each clock pulse are illustrated. This ts 
repeated until t = KN (ie. at the end of the time block), at which time the parameters 
@ are solved for, the matrix is reinitialized, and the process is repeated. 


The basis of the Givens rotation ts the matrix 


I 0 0 
Qip.q) =}90 rp.q) 0 (3.2) 
0 0 4, 


associated to each pair of indexes p,q € (1,n+1), with I, and I, being identity matrices 
of dimension (q-2) X (q-2) and (n+ I-q) x (n+ l-q) respectively, and ris a 2% 2 matrix 
of the form 


(pau (p,q) —_S(p,q) 3.3) 
-S(P.q) (p,q) 


The matrix Q has the property that it affects only two rows at a time, i.e. rows q and 


q-l. Application of the transformation Q(p,q) to any matrix A of appropriate 


dimension yields 


Zz 


Xe x 
Qip,qA =] x I x (3.4) 
x 0 x q 
X XX 
4 
: -» 
provided c(p,q) and s(p,q) in (3.3) are such that 
C(P.q) a..4, + S(P.4) a, = | 
(or) 
¢(p.q) 4g, — S(P.4) ag.4,, = 0 


In Figure 3.3, application of the Givens rotation Q(3,4)Q(2,3)Q(1,2) to the left 
matrix results in the rotated matnx shown on the right. We see that to perform Givens 
rotations, we must be able to calculate addition, subtraction, multiplication, division, 
and squares. Next, we will see how to perform the rotatrons, by using add and shift 


operations only. 


C. CORDIC TECHNIQUE ' 

The CORDIC (COordinate Rotation DIgital Computer) technique [Ref. 8: pp. 
162-165] is particularly attractive in fixed point arithmetic for generating different types 
of trigonometric functions. The principle involved is to rotate a vector (x,y) through 
an angle @ by a series of “properly quantized” angular steps. The single rotations of 
the vector (x, y) are computed at each step by a combination of simple add, subtract, 
and shift operations. 

Consider again a vector in the plane as in Figure 3.2. The vector is represented 
by its x and y components. Rotating the vector through any angle 6 leaves us with 


new rotated coordinates 


xcos6 + y sind 3.6) 


“<i 
ll 


ycosé #xsin6d 


where the top sign refers a clockwise rotation. 


23 


In the CORDIC technique, the rotation is performed in a sequence of angular 
steps Q),, such that the sum of them approaches 6. The W'S can be positive or 
negative, so 

d=, +0, +..+0, } es 
If we define y, = +1, then we can express 6 as 


6 = 2. 7,0, (3.8) 


The choice of the angles w, is determined on the basis of computational convenience. 
In particular, choose 


= -1 7521 as 
w, = tan” (27) os oe (3.9) 


and consider a single rotation by an angle w.. By applying (3.6) and dividing by cos @, 








we obtain 
x; mn 
OS Coit ae cena eee 
_ (3.10) 
os zt 
we fee a 
i 


The factors x,'/cos @, and y,’‘cos @. are the rotated components of the vector (x,y). 
Note that the new vector is not only rotated, but also scaled by a factor I/cos Q.. 


Now, combining (3.7) (or (3.8)), (3.9) and (3.10) we can define a recursion 


xl ieee yi2" ie vy;2" 
Yieit 3, ice een 


K-x 


aml 
Ky’ (3.11) 


for the rotation by an angle y.@,. In binary arithmetic this can be implemented with 
simple shift and add operations. Figure 3.4 illustrates a typical system for the 


determination of y.. 





Figure 3.4 General Block Diagram of CORDIC Computation. 


We can easily compute values of w., i = 0,1,2,.. and the corresponding scaling 
factor k, as given in Table I. ” 

For purposes of this research, we want to rotate an initial vector until its y value 
is equal to zero. The algebraic sum of the sequence of predetermined rotations will 
give us the angle 6 that the vector was rotated through. These rotations can be 
encoded into a’ binary sequence y that identifies the rotations performed. This 
sequence of rotations can then be passed to the remaining vectors in the affected rows. 
Figure 3.5 illustrates a series of rotations for an arbitrary vector which we desire to 
rotate by 30°. Note that in the figure, the desired angular value is approached quickly, 
because w, decreases by approximately half during each step. The y sequence in the 
fiyuresvoudupe (+ 1,—1,+1,—1,+1,+ |, suletel. — 1,+1,— 1). 

In an ideal case as just presented, the value of |y| decreases during each step. 


However, this may not always be the case. For example, consider the vector 
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TABLE 1 
THE BINARY CORDIC CONSTANTS 
2H o, 
(degrees) 
1.0000000000 45.00000000 1.4142135 
0.5000000000 26.56505124 1.581138826 
0.2500000000 14.03624340 1.629800596 
0.1250000000 7.12501632 1.642484060 
0.0625000000 3.57633432 1.645688908 
0.0312500000 1.7899 1064 1.646492240 
0.0156250000 0.89517384 1.646639215 
0.0078 125000 0.43761428 1.646743467 
0.0039062500 0.22381056 1.646756030 
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Figure 3.5 Set of CORDIC Rotations. 


(x,y) = (5,1) and @ = 11.3°. The first rotation in the CORDIC algorithm (see Table 
1) would be -45°: this gives us a rotated vector (x,v) = (4.2, -2.8) and @ = -33.7°. 
Note that the value of |y| has actually increased. To make the algonthm more 
efficient, we modify the sequence y to include the value zero. Whenever a rotation 


causes |y| to increase, we do not perform it, and we set the corresponding y, to Zero. 


—- 
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Then we continue on with the next rotation value and repeat the process. For the 
example just cited, the next value to be tried would be 26.6”. 


A simple algorithm to perform the CORDIC rotations can be seen in Figure 3.6. 


Pe AYZE? itx > Othen x, = x; a 


else Gy) oe a ay 


Y 


“Swcecal S5 i li | 
else ¥p = —1 


fort;— Oto N= Ido (* number of iterations *) 
ify, — ¥,2"x, > y, then (* {y| increases *) 


X X 


ia oy 
itd i 
ot a 
else 
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end for; 


end. 





Figure 3.6 Algorithm for CORDIC Rotations. 


D. SYSTOLIC ARRAYS 
1. General 
We now examine the parallel structure that will be used to solve the least 
squares algorithm described above. We desire a structure that accepts a sequence of 


regression vectors @(n) and signal y(n) as input and then outputs the estimate for 9. 


oa 


Specifically, we are interested in a high performance parallel strucuture that can be 
implemented directly as a hardware device in order to deliver maximum throughput. 
Svstolic arrays represent a structure suitable for these characteristics. 

The key to inexpensive implementation is simple and regular interconnections. 
Additionally, we want to allow the computations to proceed concurrently with the 
input, in order to maximize the throughput. This is known as pipelining. [Ref. 3: p. 
257] 


cnc) p, (t) ® (t) 





Figure 3.7. Systolic Array. 


A systolic array meets these requirements. Figure 3.7 shows a typical system. 
As noted earlier, the array is simply a network of processors that are regularly 
connected. The data is continuously “pumped” through this structure, thereby 
minimizing overall execution time, since all the processors work in parallel. 

It was shown that two processors would be necessary to solve (3.1). Previously 


used configurations have consisted of-a triangular array (as in Figure 3.7) to compute 
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the upper triangular matrix, and a linear array to solve the system of equations. Figure 
3.8 shows the typical design for the case where d (dimension) = 4. This thesis 
discusses an alternative configuration in which the linear section is replaced by a 
second triangular section identical to the first one. This new design is discussed later in 
this chapter. Both designs use a single clock signal to control operations. We will now 
review the structures of the triangular and linear sections before continuing on to 
discuss the alternative design. 
2. Triangular Array 

The triangular systolic array performs the rotations as described in the section 
on the CORDIC technique. The processors work simultaneously at each clock pulse. 
The data regression vector @(t) and output signal y(t) is input to the top of the array, 
and rotations are calculated at each clock cycle. 

The triangular array consists of two types of cells: edge (or boundary) cells 
and internal cells. The edge cells are represented by the circles in Figure 3.7 or 3.8; the 
boundary cells are the squares. Figure 3.9 defines the operations of these cells. The 
edge cells compute the rotation vector y, which consists of a sequence of +1 or 0 as 
discussed previously. This vector y is then passed to the internal cell on its right. The 
internal cell then rotates its (x,y) vector by the value specified by y. These operations 
are performed down the length of the affected rows. 

Each cell of the triangular array stores an element of the upper triangular 
matrix R(n) from Equation (3.1), and it is initialized to zero for internal cells and to 
O,! for the edge cells. Then, each row of cells in the array is used to combine one row 
of the stored matrix with the data received from the above cells. As discussed 
previously, the array maintains its upper triangular form throughout the computations. 

A delay of one clock cycle per cell is incurred when passing the rotation 
parameters along a row. Therefore, it is necessary to “skew” the input data as seen in 
Figure 3.8, so that the input data interacts properly with the previously stored 
triangular matrix. Because the cells are all operating simultaneously, the data in the 
system at any time t consists of values from (2n) different matrices. Figure 3.10 
demonstrates this for a system with n=3. In the figure, we can see that at time (t+5), 
there are also values present from the five previous matrices (Le. tt+4, t+3,...,t). In 
order to get all the cells in the array to a similar time state, the array would have to be 
clocked an additional 2n-1 (five) cycles, feeding zeros as input where necessary. At the 


completion, all cells will be at the same time (t+ 5). 
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Figure 3.8 Systolic Array Implementation for Parameter Estimation: Old Design. 
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EDGE CELL 


x 


x’ is computed recursively from 


~j} A 
e = e a2 e 
= tet xe y; 


+1 if y; > 6 


7; 6 if y, = 6 or the rotation 
would increase |y| 


if y; < 6 


INTERNAL CELL 


x’ is computed recursively from 


x 


ei, a ea 
awe fen 


y”’ is computed recursively fron 


i 
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(over the sequence 7, ) 





Figure 3.9 Definition of Cell Operations for Triangular Section. 


Note that at the same time the triangularization process is being carried out, 
the column vector Bey is also being computed by the right side column of internal cells 
using y(n) as its input. At the end of the triangularization period (N), we are ready to 
solve for Ox + 1)N’ The data in this triangular array is clocked out to the next array 
section that will compute the parameters. 

3. Linear ‘Array 

The linear systolic array has been used in previous implementations to solve 
for the estimated parameters. The linear section consists of one boundary cell and 
(d-1) internal cells as seen in Figure 3.8. The operation of the cells as they compute 
the parameters are shown in Figure 3.11. Note that the cells are different from those 
in the triangular array, increasing to four the number of unique cells necessary in the 


combined system. 
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Figure 3.10 Data Flow in Triangular Section. 
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Figure 3.11 Definition of Cell Operations for Linear Section. 


It is shown in [Ref. 2: p. 36] that the time required to solve for 9 using the 
linear array is equal to 2d. At the end of that period, the parameters are used as initial 
values for the triangular arrav, and the triangular section again commences operation. 

We now turn our attention to the replacement of the linear section by a 
second triangular system, and discuss the differences between the two designs. 

4. The Use of a Second Triangular Array as the Solution Section 

An alternative implementation can be obtained by solving for 8 as shown in 
Figure 3.12. In this implementation, at the end of the triangularization period, the 
data is passed from the first triangular section to the second triangular section in a 
reversed fashion. The second array performs the same type of operations as the first, 
therefore the cells are identical. 

The key to using another triangular section is that, by proper combinations of 
rows, we can force zeros into all elements of a given row but one, so that we can solve 
for each of the parameters. Also, the fact that orthogonal operations are used makes it 
more robust in the presence of numerical errors. 

To see how this works, consider an arbitrary set of equations in upper 
triangular form as in Equation (3.12). Now, x, is simply solved as b,/a3. To solve for 


X,, We can force a,, to zero by a linear combination of rows two and three. We know 
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Figure 3.12 Systolic Array Implementation for Parameter Estimation: New Design. 
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2iyX) 7 4yoX, °F aX, = Dy 
Ox, + a,x, + a5;X; 9 = b, (3-12) 
Ox, ae Ox, = 233%; b, 


this can be easily accomplished by Givens rotations. Then X, is found to be b,’a,,. 
Similarly, by row operations on rows 1, 2, and 3 we can make aj5 = a,,; = 0 in row 
one. Again, solving for x, is a simple operation: b,‘a,,- These type of operations are 
exactly what the triangular array was designed to do. 

Figure 3.13 illustrates these operations in matrix format. Note that the array 


is initialized to all zeros here, whereas the first triangular section is initialized to 5,9 
and Gol. 


KN 


compute 6, compute 8, compute 0, 





Figure 3.13 Solution of System of Equations. 


To see how the svstem operates, recall the first triangular array at time N. 
Now we must feed the data down into the second triangular section in an appropriate 
manner so that we can solve for the parameters. The same delay (one clock cvcle per 
cell) in propagation of data applies to this triangular section as it does in the first 
section. Hence, we must carefully choose when to sample the array in order to get the 
correct values with which to calculate the solution. 


Figure 3.14 shows the data flow for a simple system where d = 3. The input 
data is skewed as it was in the triangular section. It can be seen that the values a,,, 


a5, and a,, are available at times N+1, N+4, and N+7 respectively. Similarly, 
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Figure 3.14 Data Flow Through Second Triangular Array. 


values b,, b,, and b, are found at times N+4, N+6, and N+8. In general, for any 
size system n, the coefficients a.. and outputs b, are available as seen in Table 2. Note 
from the figure that the coefficients are “picked off’ from the edge cells at the 


appropriate times, while the outputs are found in the rightmost set of internal cells. 


TABLE 2 
TIMES OF AVAILABILITY OF DATA 


coefficient time available 
at N + 1 
N+ 4 
N + 7 


ereienet 


an -2.n-2 


IN tear. 1) 
mee (i + 3) 
Neat aS) 





This new design operates slower than the linear system seen in [Ref. 2: p. 36]. 
The time to solve for @ in this design is equal to the time until b, appears, which is 
equal to (3n - 1). This compares to 2n (or 2d) in previous implementations; hence, n-1 
more clock cycles are required. 

On the other hand, simplification is gained in the manufacturing process since 
the number of unique cells is reduced. Additionally, use of the CORDIC rotation 
technique allows us to use simpler processors. The tradeoffs to be considered are 


simplicity (cost) versus speed. 
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IV. SYSTOLIC ARRAY MODELING 


A. GENERAL 
1. System Equations 
We now study a particular “unknown” system and the performance of the 
systolic array in identifying its parameters. We will simulate a system both with and 
without noise present. Additionally, we compare the results obtained by using floating 
point versus fixed point arithmetic. 


For purposes of the simulation, consider a plant with discrete time transfer 


function 
H(z) = ae (4.1) 
(z-0.5) 
which corresponds to the linear difference equation 
v(t) — L5y(t-1) + 0.75y(t-2) — 0.125 y(t-3) = 1.3u(t-1) (4.2) 
In particular, let the input sequence be 
u(t) = sin(3mt/10) + sin(3mt/5) + sin(37t/2) + sin(97t/5) (4.3) 


The dimension of the parameter vector is four, defined as 8 = [-1.5, 0.75, -0.125, 1.3]. 
In the parameter estimation problem, these values are assumed to be unknown. The 
input is “sufficiently rich” in frequency (minimum of n sinusoids) to excite all modes of 
the system [Ref. 1: p. 74]. Results of the simulations are discussed in the following 
sections. 
2. Choice of Initial Values 

For the recursive algorithm, recall that we need to initialize the systolic array 
with a parameter estimate 0,0(0) and 6)J. The value of 6, which is related to the 
confidence in the initial estimate as discussed in Chapter III, is chosen to be one for all 


simulations. If some information about @(0) is available, it should be used when 
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determining appropriate initial conditions. In the absence of any prior knowledge of 9, 
we choose 0(0) = 0. 
3. Choice of Block Length 

We have chosen four different block lengths during the simulation studies: 
N = 3,5, 10, 15. It will be seen in the noiseless case that the parameters’ exhibit the 
fastest rate of convergence when N = 5. However, when noise is present, there is a 
tradeoff: the system is more sensitive to disturbances when the block length is shorter. 
This is because a longer time block provides for more time averaging, thus attenuating 
the effects of noise. Therefore, with more samples available, disturbances have a lesser 
effect on the estimates. Hence, we must make an informed decision about what trait is 
most important in a specific application. 

4. Fixed Point versus Floating Point 

In the sections that follow, we also compare the performances obtained when 
using floating point processors or fixed point processors. Fixed point arithmetic 
Operations are performed using finite registers and simple shift operations. Therefore, 
they are simpler to implement than floating point operations. Additionally, floating 
point operations in general require longer registers (exponent and mantissa) to 
represent numerical values, which might add complexity to the processor. Hence, the 
simplicity of fixed point arithmetic is desired. 

The problem to be solved in the fixed point case is how to convert the input 
data values, which we normally expect to be floating point, into fixed point values. 
The answer, of course, is to appropriately scale all numbers so that they stay within the 
limits of the fixed registers. This task would be assigned to the Analog/Digital (A/D) 
converter, and the scale factor used would be in large part dependent on the range of 


values of the discrete plant samples. 


B. SYSTEMS WITHOUT NOISE 

The system under consideration was first modeled in an ideal environment 
without noise to verify convergence of parameters. Figures 4.1 through 4.8 display the 
results of these simulations. In the figures, the estimated parameters (@,, 0,, Q,, 0.) 
are plotted along the vertical axis, and the block number is plotted along the horizontal 
axis. Block number simply indicates the number of blocks that have been completed, 
where N identifies the block size. Specific results for the floating point and fixed point 


systems are discussed below. 
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Figure 4.1 Floating Point, Without Noise, N = 3. 
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Figure 4.2 Floating Point, Without Noise, N = 5. 
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Figure 4.3 Floating Point, Without Noise, N = 10. 
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Figure 4.4 Floating Point, Without Noise, N = 
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BLOCK NO. 


1. Results Using Floating Point Arithmetic 
Figures 4.1 through 4.4 show the floating point results for block lengths of 
N = 3,5, 10, and 15 respectively. In order to evaluate and compare the relative rates 
of convergence, we estimate (by visual inspection) in each case where the parameters 
appear to have converged to their correct values. These values are tabulated in 
Table 3. It can be seen that with N = 5, the least number of clock cycles were 


required. Therefore, in this particular case, we should choose a block length of five. 


Aiea 


TIME TO CONVERGENCE 
(NOISELESS CASE) 


BLOCK ee BLOCK 


AT CONVERGENCE 
3 16 
5 6 
10 
15 





2. Results Using Fixed Point Arithmetic 
The results for the fixed point system are shown in Figures 4.5 through 4.8. 
Note that the parameters converge just as rapidly as they did in the floating point 
implementation. (The simulations were run on an IBM 3033 mainframe, in which 
there are 31 bits available for an integer number.) This indicates that the systolic array 
with fixed point processors performs as well as the floating point system. This 1s a 
distinct advantage when we consider the simplicity of fixed point processors as 


discussed previously. 


C. SYSTEM WITH NOISE 

The system was next modeled with noise present. The noise term v(t) is a 
sequence of independent random variables identically distributed with zero mean (as 
white noise). We use a variance of 0.5 for these noise random variables. The noise 1s 
added to both the incoming signal u(t) and measured output y(t), as would be expected 


in a real system. Figure 4.9 illustrates these signals both with and without noise. 
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Figure 4.5 Fixed Point, Without Noise, N = 3. 
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Figure 4.6 Fixed Point, Without Noise, N = 5. 
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Figure 4.7 Fixed Point, Without Noise, N = 10. 
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Figure 4.8 Fixed Point, Without Noise, N = I5. _ 
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Figure 4.9a Input and Output Signals Without Noise Present. 
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Figure 4.96 Input and Output Signals With Noise Present. 
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1. Results Using Floating Point Arithmetic 
Figures 4.10 through 4.13 display the results for the four different block 
lengths. With N = 3, noise causes the parameters to vary considerably. When N = 5 
or 10, disturbances are less likely to affect the parameters. When block length is equal 
to 10, the parameters have converged reasonablv well within three blocks (or a total of 
30 cycles). For N = 15, we see that the parameters are least affected bv the noise, as 
expected. Here we find the relative convergence time is about three blocks, or 45 
CVCleS: 
2. Results Using Fixed Point Arithmetic 
The results for fixed point implementation are shown in Figures 4.14 through 
4.17. Again we note that they exhibit the same performance as in the floating point 


cases. Effects of block length are the same as noted previously. 
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Figure 4.10 Floating Point, With Noise, N = 3. 
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Figure 4.12 Floating Point, With Noise, N = 10. 
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Figure 4.13 Floating Point, With Noise, N = 15. 
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Figure 4.14 Fixed Point, With Noise, N = 3. 
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Figure 4.15 Fixed Point, With Noise, N = 5. 
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Figure 4.16 Fixed Point, With Noise, N = 10. 
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Figure 4.17 Fixed Point, With Noise, N = 15. 
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VY. CONCLUSIONS 


The estimation of parameters using a parallel structure has been described in the 
preceding chapters. In this chapter, we discuss tradeoffs, advantages, and 
disadvantages of the various systems we have investigated. 

Several possible conditions have been simulated in order to investigate the 
behavior of the parallel algorithm. We saw that the parameters converged well under 
all conditions, even in the presence of noise. Use of block processing tends to average 
out the effects of noise, perhaps at the slight expense of convergence rate. 

A parallel algorithm suitable to hardware implementation has been presented in 
this research. The main contribution which distinguishes this algorithm from others 
available in the literature is the fact that we have replaced two different processors by 
two identical ones. With previous designs, a total of four different computing cells 
were required: two for the triangular section and two for the linear section. For the 
new design, we see that we need only two types of cells. These are the edge and 
boundary cells, whose operations were described in Chapter III. However, this new 
system also needs more total cells. Specifically, it can be shown that an additional 
’2d(d+1) cells are necessary, where d = dimension of system. Even for a sixth order 
system (d= 6), this requires only 21 additional cells. In a VLSI scheme, additional cells 
are considered inexpensive. 

The second triangular section 1s somewhat slower than the linear section. We 
saw in Chapter III that (d-1) additional clock cycles were required to operate the 
second triangular section. 

The functions of the cells must also be considered. The use of the Givens 
rotation by Equation (3.2) requires the use of a processor which can perform squaring 
operations, obtain square roots, multiply, divide, and do simple add and subtracts. Use 
of the CORDIC technique greatly simplifies the operations, in the sense that rotations 
can be implemented by use of addition and shifts only. Note from Table | that after 
rotation, the vector has increased in magnitude by the amount listed in the column k.. 
This is easily compensated for by performing a right shift (division by 2) every other 


rotation or so. 
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Another important consideration is the use of fixed point versus floating point 
arithmetic. We saw in the simulation that the simple fixed point processors perform 
equally as well as floating point processors. 

When the results of this thesis are considered, we see that a systolic array using 
the CORDIC technique presents an attractive means of implementing the parallel 
algorithm to identify unknown system parameters. This, in turn, leads to important 


applications in adaptive control systems and real time identification problems. 
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APPENDIX 
PROGRAM LISTING FOR SYSTOLIC ARRAY SIMULATION 


This program calculates the parameter estimates for a discrete time linear system. 
[t is Written in Waterloo Pascal. In accordance with Pascal syntax rules. the size of the 
arrays must be defined in the declarations. Hence, one must Know the order of the 


svstem to be modeled before attempting to use this program. 


program systolic (input,output); 


const 
dimension = 4; plusone = 5; (* dimension is order of system *) 
eemmacoune = 30; ts anticipated max number of rotations *) 
lmerstop = 15; used as index in main * 
type 
gamma_vector = record 
pi: boolean; 
scale_factor: real; 
Panam array (.0..gammacount.) of integer; 
end; 
Beene hha y = array Mien cei 1..plusone.) of gamma_vector; 
heta_array = aay -1..dimension.) of real; 
u_array = array ‘ . dimension, pecene. | of real; 


a_array = array (.1..dimension, 1..plusone.) of real; 

var 
gamma: gamma_array; (* direction of rotations “I 
gamma2: gamma_array; (* direction of rotations *) 


a: Aa_array; 

a2: geal ay 

theta: theta_array; 

u: U_array; 

Ze array: 

Wok olock length, count, index: integer ; 
timer: integer; (x timer variable ) 
Slgma: real; 

ch: char; 

h,m: text; 


RAK AKKKRKRK RIKER KK IKK KKK KIRK AKAIKE IRI IK III IKK IARI AAAK ARE ) 
Procedure initialize (var sigma: real; var block_length:integer) ; 


var? oS 
1) sen eee le: 


begin . 
for i := 1 to dimension do 
for j:= 1 to plusone do 
a2(.i,j. — QO; 


writeln; eens | 

writeln ('Initialize the systolic array'); 

writeln (m,'Initialize the Sy Store array'); 

writeln ('Choose the value of sigma'); writeln; 
writeln (m,'Choose the value of Sone writeln(m) ; 
readln (sigma); writeln (sigma); writeln(m,sigma) ; 
writeln; writein(m); en. 
writeln ('Now enter the initial estimates of'); 
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writeln (m,'Now enter the initial estimates of'); 
writeln ('the vector theta in order requested'); writeln; 
writeln (m,'the vector theta in order requested'); writeln(m) ; 
for 1 := 1 to dimension do 

begin | 

writeln ae eee Se 

writeln (i, Dhe@ ta is ee 


readin (a(.i,dimension + 3h, writeln; writeln(m); 
writeln fal. +, dimeneae| +1. \. writeln; 
es Coe m,a(.i,dimension + on writein(m); 
end; 
Writeln eread the block length desired'); 
writeln (m,'Enter the block length desired'); 
readin (block_length) ; 
Writeln block tenge 74); 
Writeln (m,block_length:4) ; 
for 1 := 1 to dimension do 
for j := 1 to dimension do 
a sip Je) ee ao 
for i1:= 1 to dimension do 
begin | | 
a(.i,i.) = stoma 92k. . ene 
write (m,a(.i,dimension + 1.):9:4);  (*write initial values*) 
oo Seed + 1.) := sigma * a(.i,dimension + 1.); 
ena: 
1 := 0; 


end: 
(KAAKARA AKA ARERR AIK ARK RAK KAKA KER ERIK IKKE RIKER KR ARK ) 


(HARIKA II AA IR AAI KAKA AIR AA IR IA IK RAK AAR KIRK FAK KARA KIKI AAA K KARR ARAKI) 
Procedure reinitialize (sigma: real); 


var. 
1,3: integer, 


begin 
for 1 := 1 to dimension do 
for j:= 1 to plusone do 
AZ (eae) s= 0; 
for 1 := 1 to dimension do 
a (.i, plusone.) += sigma * theta (712); 
for 1 := 1 to dimension do 
for j := 1 to dimension do 
a (.1,j.) := Oi; 
for i := 1 to dimension do 
a(.i,i.) := sigma * 1; 


end 
(KAKA IK RAI AK IK KKK ARIK IKK KK IKK AKAIKE KIKI IK IKK IKARIA ER AEK ERE KKK KEKE ) 


KAAK ERIKA KKK AKER AR RAKE RAKE RAK RRIKKAK RRA KERR AREA KERAR RK ARK REK KERR ) 
Procedure internal (x:real; var y,yout:real; var gammain, gammaout: 
gamma_vector) ; 


(* This procedure performs the rotation on the x-y pair, given the 
amma vector which contains the number and directions in which 
o rotate. The new values of the x-y pair are passed out * 


var, 
leeineegen. 
old_x, old_y: real; 


Procedure scale (var x,y: real; magfactor: real); 


begin 
X:= x * magfactor; 
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y:= y * magfactor; 
end; 


Function two (i:integer):real; 
(eiters function ae the exponentiation of the integer 2 to 


the power ( - 
var 
Nes integer- 
as real; 
begin 
ros 4; 
1f 1 = 0 then two := 1 
else for n := 1 to i do 
Yres=r ily 
two := Lr; 
end; 
begin 


1f gammain.pi then 
egin X:= -K; y:= -y; end; 
old_x := x; old_y := y; 


Los : 
while (gammain.params({.i.) <> 9) and (i < gammacount) do 


in 
if ganmain.params(.i.) <> 0 then 
egin . | 
x := old_x + seme ener tee | Z See A710 lduys 
y := old_y - gammain.params(.i.) * two(i) * old_x; 
Old_x := x; old_y := y; 
end; 
1:=1+1; 
end; 
scale (x,y, gammain.scale_factor); 
yout := y; := xX? 


a ( 
: gammaout := gammain; 
end; 
(RAAKKAAAKKKARKKRKRKAK KKK RKKRKKKKKKKAKK KKK KKK KAR AK KARR KKK K RR AIK ) 


RAKKKAKKRKKKAKKARKKKAAKKK KAKA RKRKKAARKAK KKK AKKKKK AAA KKKKAKERERERKAKKRE ) 
Procedure edge (x:real; var y: real; var gamma: gamma_vector) ; 


(* This procedure builds the gamma vector based upon the values of 
the Bey Palas lgescqamnd Vector econtalns@only the values -1, 0, 1, 
Oi tole cient iemec Cor ES) Compe GOtated councerclockwise. 
Piotieectnien LuemuSe beerotaceccrockyise. Ihe value isiset to 
zero 1f the next rotation would cause the new absolute value of 
y to be greater than the previous absolute value of y. This way, 
we can prevent the rotation from taking place and cause the 
Vaures topconverge quickly. The value of 9 is placed into the 
gamma vector once a pre-determined lower limit of y is reached, 
and it signals that no more rotations are to take place. *) 


const — 
low_limit = le-6; 
var | . 
1 : integer; 
temp_x, temp_y: real; 
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Function two (i:integer):real; 


(* This funceron calculates, the exponentiation of the integer 
Z2 to the power ( -1 ) *) 


var 
n: integer; 
r: real; 
begin 
r := 1; 
if 1 = 0 then ave := 1 
else for n a EC VE dG 
rsw=r on 
two ;:= Yr; 
end; 
Procedure one_rotation (x,y: real; var temp_x, temp_y: real); 
(* This procedure is used to do a rotation on the x-y pair. 
The input values of x and y are not changed, but the 
rotated values are passed out as temp_x and temp_y. x) 
var 
temp_gamma: integer; 
begin 
if y > low_limit then temp_gamma := 1 
rad temp_gamma := -l; 
temp_x := x + temp_ gamma * Bug ee 
temp_y := y - temp_gamma * two * x; 
end; 
begin 
if x Kn ‘0 then begin (* initialize if x <0 %) 
KX 3S0Sk?; Voer= =v paamna pl := true; end 
else gamma. pi := fals 
gamma.scale_factor = oe (x eee alae scale factor *) 
oe (abs(y) > low _limit) and (i < gammacount) d 
egin 
oe rotation (x,y,temp_x, temp_y); ra check to see what rotated *) 
* values of x,y will be *) 
while (abs(y) < abs(temp_y)) and (4 < gammacount) do 
begin Bere aG this loop until new x 
oaer Ee eae i.) := 0; ‘ y<oldy * 
one ocean (x,y,temp_x, temp_y); 
en 
if y > low. limit then gamma.params(.i.) := 1 (*do CW rotation * 
else if 3 “low_limit then gamma. params(. i.) := -1; (* CCW * 
1:= itl; 
x := temp_x; := temp_y; * update the rotated values*) 
if y > low. eae ae gamma.params ee 
else if y < -low_limit then gamma. «params ( 
if i= 1 then gamma.scale_factor:= i7sart(2) oe apaate scaling *) 
else gamma.scale_factor:= gamma.scale_factor 
cos(arctan(two(i- 1))); 
end; Oe while x) 
gamma. -params (| dies) 9; ti end of rotations *) 
ae := x * gamma. scale_ factor; * scale final values *) 


en 
(x So i HSE ERI I ISI IIIA aI IRI IIa ROR I I aR RI I RRR IO.) 
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begin (* main *) 
reset (h, ‘data text al'); 


rewrite(m, ‘outfile text al'); 
imiveraraze (sigma, blocks See 
for timer := 1° to timerstop 
begin 
ge nore := 1 to block_length do 
egin 
for j := 1 to co + 1 do 
read h, GeO) le): 
readin 
for k := 2 to 2 * dimension + 1 do 
begin . 
for 1 := 1 to dimension do 
eo, 
if ai<= 35 and (j <= dimension + 1) then 
egin 
1fil=j —. 
; edge ae ce Te) al adegin))  Cammaq ed, ) «)) 
e 
DEM a Un enlin lanes ema (eid); ue Pat 
amma (eee ), gamma(.i,j.)); 
end; ee ee > 
end; (* for i = 1 to dimension * ) 
end; CX for k * 
end; (* for index *) 


(* solve system *) 


oo .1,1.) := a(.dimension, dimension.); e shift bottom of * 
a2(.1,plusone.) := a(. dimension, plusone. 3 * a into a2 


theta (. dimension.) := a2(.1,plusone.) / a2(.1,1.); 
count := dimension - 1; 
while count >= 1 do 
begin 
pot = qcrnens ton downto 2 do (* shift down a matrix *) 
or := 1 to plusone do (* and feed into u x ) 
at, jee = a(.3-1,7 Kk.) 
for j 1°to dimension do . 
i2(.0 ,jJ-) :® a(.dimension, plusone - j.); 
UZ. 0 /plusone. ) := a(. dimension, plusone Sir 
for k := 2 to 2 * dimension + 1 do 
begin 
for i := 1 to dimension do 
sear 


A (a <= ae 55 and (j <= dimension + 1) then 
egin | 

if 1 = j then. i ho 
edge (UZ (1-1), a2(al, ge) gammaz( .1,).)) 


2 
; ° , 2 ) 4 ‘ < 
internal (u2(.i-1, “Gani BCE SFiS Pee j-)); 


eS (Seaford x 1 to dimension ~*) 
CARLO Te sie s 
eet a count.) := a2 (.dimension - count + 1, plusone.) / 
a imension - count + 1, dimension - count + 1. 
count := count saimelig 
end; (* while * 
for i := 1 to dimension do (* write out values x ) 
write (m, theta (.i.):9:4); 


reinitialize sigma) ; 
end; (x for timer *) 
end. 
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